Setup

1

1(a)

## [1] "The header of lung cancers: "
##       x     y       lon      lat
## 1 35900 41730 -2.621759 53.65058
## 2 35310 42690 -2.712468 53.73637
## 3 35340 42250 -2.707256 53.69685
## 4 35900 41760 -2.621798 53.65328
## 5 35780 41620 -2.639762 53.64060
## 6 35320 42700 -2.710967 53.73727
## [1] 978
## [1] "The number of lung cancers is 978"
## [1] "The header of larynx cancers: "
##       x     y       lon      lat
## 1 35320 42800 -2.711119 53.74626
## 2 35310 42230 -2.711769 53.69502
## 3 34910 41850 -2.771718 53.66050
## 4 35260 42080 -2.719111 53.68150
## 5 35300 42150 -2.713162 53.68782
## 6 35230 42660 -2.724548 53.73360
## [1] 58
## [1] "The number of larynx cancers is 58"
## [1] "The header of incinerator: "
##       x     y       lon      lat
## 1 35450 41360 -2.689291 53.61696
## [1] 1
## [1] "The number of incinerator is 1"

1(b)

1(c)

The general geographic location of data is at Leyland, Chorley and Penwortham. From the bounding box, the polygon “window” (the green box) encloses all observed points, spanning roughly 15–20 km in both the north–south and east–west directions

  • Lung Cancer (red points): Their count is the largest and scattered throughout the whole domain. They are clustered around the centres of Leyland, Chorley and Penwortham

  • Larynx Cancer (blue points): These points are fewer. There are less obvious clusters around the centres of Leyland, Chorley and Penwortham

  • Incinerator (black point): There is only one incinerator, and it is situated near the southwestern boundary of the bounding window. At first glance, there is not any obvious pattern of cancer cases around the incinerator.

2(b)

Setup

Homogeneous Poisson

Inhomogeneous Poisson

log-Gaussian Cox cluster

simple sequential inhibition process

Assumption & Intensities

Homogeneous Poisson

  • Assumption: Points occur randomly and independently with a constant rate λ

  • Intensity: λ(x,y) = constant.

Inhomogeneous Poisson

  • Assumption: Points still occur independently. The number of events in two non-overlapping regions are independent, regardless of differences in intensity.

  • Intensity: λ(s) = \(\hat{\lambda}(s) = \frac{1}{h^2} \sum_i \kappa\left(\frac{\|s - s_i\|}{h}\right) / q(\|s\|)\), a known function of location

Log-Gaussian Cox Poisson

  • Assumptions: Points are Poisson conditional on a latent Gaussian field Z(x), so the log-intensity is random and spatially correlated. Cox processes are considered doubly stochastic, intensity is heterogeneous, but also may be a random quantity

  • Intensity: λ(x)=exp⁡(Z(x)). Often leads to clustering.

Simple Sequential Inhibition (SSI)

  • Assumption: Points are placed sequentially; each new point must be at least ‘r’ away from existing points.

  • Intensity: No simple closed form

3

What is CRS:

  • CRS means uniform distribution. Any location has equal probability of containing a point.

  • CRS is modeled by a homogeneous Poisson process, with constant rate parameter.

  • As for independence, the occurrence of one point does not have any influence on the likelihood of a point in another locations.

  • Given that there are N points of the Poisson process in area A, these N points are conditionally independent and uniformly distributed in A

lung Cancer

Ripley’s K-function with L function adjustment

As we can see, empirical values is a lot larger than our theoretical ones, and lie above and outside the envelope. This suggests a clustering pattern.

G function

As we can see, empirical values is larger than our theoretical ones, and lie above and outside the envelope . This means there is clustering.

Larynx Cancer

Ripley’s K-function with L function adjustment

As we can see, empirical values is above our theoretical ones, and almost within the envelope. This suggest a relatively clustering pattern.

G function

As we can see, empirical values is above our theoretical ones, and almost within the envelope. This suggest a relatively random pattern.

Comparison

Ripley’s K-function:

  • K(r) tests the expected number of events within distance h from an arbitrary event (excluding the chosen event itself) divided by the average number of events per unit area. K(r) = E(N0(r))/λ

  • For a process that is more regular than CSR we expect fewer events within distance r of a randomly chosen event. For a process that is more clustered than CSR we expect more events within distance r of a randomly chosen event

  • K(r) is equivalent to showing the variance of the number of events occurring in subregion A

  • Under CSR K(r) = πr2, the area of a circle of radius r

  • Deviations above the horizontal line indicate clustering because there are more events within distance h than expected. Deviations below the horizontal line indicate regularity because there are fewer events within distance h than expected.

  • L version: plot \((h) \text{ vs } \hat{L}(h) - h \text{ where } \hat{L}(h) = \left(\frac{\hat{K}(h)_{ec}}{\pi}\right)^{1/2}\)

G function:

  • measures the distribution of distances from an arbitrary event to its nearest event (i.e. uses nearest neighbor distances).
  • Let G(r) = P(Di ≤ r), the CDF for the probability that the nearest neighbor distance is less than r
  • Let ˆG(r) be the proportion of observed points with nearest neighbors less than r.
  • Let hi be the distance from the ith event to the nearest other event in D, \(\hat{G}(h) = \frac{\#(h_i \leq h)}{n}\)
  • If ˆG(r) is much greater than G(r), that means there is clustering, whereas if it is smaller that means there is regularity.

4

Lung Cancer

Quadratcount

Quadrat.test

## 
##  Conditional Monte Carlo test of CSR using quadrat counts
##  Test statistic: Pearson X2 statistic
## 
## data:  
## X2 = 1854, p-value = 0.001
## alternative hypothesis: two.sided
## 
## Quadrats: 59 tiles (irregular windows)
## 
##  Conditional Monte Carlo test of CSR using quadrat counts
##  Test statistic: Pearson X2 statistic
## 
## data:  
## X2 = 1854, p-value = 5e-04
## alternative hypothesis: clustered
## 
## Quadrats: 59 tiles (irregular windows)
## 
##  Conditional Monte Carlo test of CSR using quadrat counts
##  Test statistic: Pearson X2 statistic
## 
## data:  
## X2 = 1854, p-value = 1
## alternative hypothesis: regular
## 
## Quadrats: 59 tiles (irregular windows)
Uniformly Distributed Clustered Regular Psrocesses
Result

X2 = 1854

p-value = 0.001

X2 = 1854

p-value = 5e-04

X2 = 1854

p-value = 1

Interpretation The null hypothesis, that points follow a uniform Poisson process is rejected. The alternative hypothesis, that points follow a clustering pattern is proved. So there is significant clustered pattern The alternative hypothesis, that points follow a regular pattern is not proved. So there is NO significant regular pattern

Larynx Cancer

Quadratcount

Quadrat.test

## 
##  Conditional Monte Carlo test of CSR using quadrat counts
##  Test statistic: Pearson X2 statistic
## 
## data:  
## X2 = 100.64, p-value = 0.058
## alternative hypothesis: two.sided
## 
## Quadrats: 59 tiles (irregular windows)
## 
##  Conditional Monte Carlo test of CSR using quadrat counts
##  Test statistic: Pearson X2 statistic
## 
## data:  
## X2 = 100.64, p-value = 0.0315
## alternative hypothesis: clustered
## 
## Quadrats: 59 tiles (irregular windows)
## 
##  Conditional Monte Carlo test of CSR using quadrat counts
##  Test statistic: Pearson X2 statistic
## 
## data:  
## X2 = 100.64, p-value = 0.975
## alternative hypothesis: regular
## 
## Quadrats: 59 tiles (irregular windows)
Uniformly Distributed Clustered Regular Psrocesses
Result

X2 = 100.64

p-value = 0.066

X2 = 100.64

p-value = 0.0255

X2 = 100.64

p-value = 0.975

Interpretation The null hypothesis, that points follow a uniform Poisson process is rejected at significant level of 7.5% The alternative hypothesis, that points follow a clustering pattern is proved. So there is significant clustered pattern The alternative hypothesis, that points follow a regular pattern is not proved. So there is no significant regular pattern

5

lung Cancer

At first, we need to know a Dirichlet tessellation is able to partition the plane so that each point’s “cell” includes all locations closer to it than to any other point. This can reveal how points partition their surrounding space.

Large cells imply that point is relatively isolated; small cells imply a local cluster. there are several areas of many small cells, meaning clusters. These clusters of small cells match the statistical evidence of Quadrat.test, K-function, and G-function

larynx Cancer

There are no strong evidence of concentration areas of many small cells. This result match with result of K-function and G-function, which indicate weaker or minimal clustering relative to lung cancer. However, this result does NOT align with the Quadrat.test result. One potentail reason is that Quadrat.test is more statistically rigorous, while Dirichlet Tesselation relies on eyeballing.

6

lung cancer

There are a few hot spots, indicating the local maxima of the kernel density. These hot spots represents the region with a relatively higher concentration of lung cancer cases, in other words, clustering. This results align with the results of K-function, G-function, and Quadrat.test.

## [1] -3.015641e-20  3.085959e-04

The maximum and minimum values of the intensity estimate over the study region is -3.0156e-20 to 3.0859e-04. The large difference between maximum and minimum indicates the existence of clustering

##    sigma 
## 72.77397
## real-valued pixel image
## 128 x 128 pixel array (ny, nx)
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
## dimensions of each pixel: 137 x 138.2812 units
## Image is defined on a subset of the rectangular grid
## Subset area = 227037826.538086 square units
## Subset area fraction = 0.733
## Pixel values (inside window):
##  range = [-3.015641e-20, 0.0003085959]
##  integral = 973.8974
##  mean = 4.289582e-06

The optimal bandwidth is 72.77397, and 4.2895e-06 is the average intensity (points per square unit) inside the window. The value integral(973.8974) is approximately equal to the number of lung cancers, which means correct implementation.

larynx cancer

There are a few hot spots, indicating the local maxima of the kernel density. These hot spots represents the region with a relatively higher concentration of larynx cancer cases, in other words, clustering. This results align with the results of K-function, G-function, and Quadrat.test.

## [1] 0.000000e+00 1.810377e-06

The maximum and minimum values of the intensity estimate over the study region is 0.00 to 1.8103e-06. The large difference between maximum and minimum indicates the existence of clustering

##    sigma 
## 642.1233
## real-valued pixel image
## 128 x 128 pixel array (ny, nx)
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
## dimensions of each pixel: 137 x 138.2812 units
## Image is defined on a subset of the rectangular grid
## Subset area = 227037826.538086 square units
## Subset area fraction = 0.733
## Pixel values (inside window):
##  range = [0, 1.810377e-06]
##  integral = 58.59965
##  mean = 2.581052e-07

The optimal bandwidth is 642.1233, and 2.5810e-07 is the average intensity (points per square unit) inside the window. Also, the value integral(58.59965) is approximately equal to the number of lung cancers, which means correct implementation.

7

Lung Cancer

  1. no covariates
## Point process model
## Fitted to data: lung_ppp
## Fitting method: maximum likelihood
## Model was fitted analytically
## Call:
## ppm.formula(Q = lung_ppp ~ 1, interaction = Poisson())
## Edge correction: "border"
##  [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
## 
## Data pattern:
## Planar point pattern:  978 points
## Average intensity 4.31e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
##                      (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## 
## Dummy quadrature points:
##      70 x 70 grid of dummy points, plus 4 corner points
##      dummy spacing: 250.0000 x 252.8571 units
## 
## Original dummy parameters: =
## Planar point pattern:  3659 points
## Average intensity 1.61e-05 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
##                      (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## Quadrature weights:
##      (counting weights based on 70 x 70 array of rectangular tiles)
## All weights:
##  range: [4270, 63200]    total: 2.27e+08
## Weights on data points:
##  range: [4270, 31600]    total: 17800000
## Weights on dummy points:
##  range: [4270, 63200]    total: 2.09e+08
## --------------------------------------------------------------------------------
## FITTED :
## 
## Stationary Poisson process
## 
## ---- Intensity: ----
## 
## 
## Uniform intensity:
## [1] 4.308655e-06
## 
##              Estimate       S.E.   CI95.lo   CI95.hi Ztest      Zval
## log(lambda) -12.35488 0.03197647 -12.41756 -12.29221   *** -386.3742
## 
## ----------- gory details -----
## 
## Fitted regular parameters (theta):
## log(lambda) 
##   -12.35488 
## 
## Fitted exp(theta):
##  log(lambda) 
## 4.308655e-06
## log(lambda) 
##   -12.35488
## Nothing plotted -- all plots selected are flat surfaces.
## [1] 26124.15
  • Since the intensity is constant, the homogeneous Poisson point possess is being fit.

  • The uniform intensity is 4.30e-06, meaning you expect 4.30e-06 lung cancer cases per unit area

  • AIC is 26124.15

  1. linear trend
## Point process model
## Fitted to data: lung_ppp
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.ppp(Q = lung_ppp, trend = ~log(x) + log(y), interaction = Poisson())
## Edge correction: "border"
##  [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
## 
## Data pattern:
## Planar point pattern:  978 points
## Average intensity 4.31e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
##                      (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## 
## Dummy quadrature points:
##      70 x 70 grid of dummy points, plus 4 corner points
##      dummy spacing: 250.0000 x 252.8571 units
## 
## Original dummy parameters: =
## Planar point pattern:  3659 points
## Average intensity 1.61e-05 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
##                      (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## Quadrature weights:
##      (counting weights based on 70 x 70 array of rectangular tiles)
## All weights:
##  range: [4270, 63200]    total: 2.27e+08
## Weights on data points:
##  range: [4270, 31600]    total: 17800000
## Weights on dummy points:
##  range: [4270, 63200]    total: 2.09e+08
## --------------------------------------------------------------------------------
## FITTED :
## 
## Nonstationary Poisson process
## 
## ---- Intensity: ----
## 
## Log intensity: ~log(x) + log(y)
## 
## Fitted trend coefficients:
## (Intercept)      log(x)      log(y) 
##   92.598926   -7.148775   -1.048353 
## 
##              Estimate      S.E.    CI95.lo    CI95.hi Ztest       Zval
## (Intercept) 92.598926 54.031934 -13.301718 198.499571        1.7137815
## log(x)      -7.148775  2.653131 -12.348817  -1.948733    ** -2.6944670
## log(y)      -1.048353  3.114080  -7.151839   5.055132       -0.3366494
## 
## ----------- gory details -----
## 
## Fitted regular parameters (theta):
## (Intercept)      log(x)      log(y) 
##   92.598926   -7.148775   -1.048353 
## 
## Fitted exp(theta):
##  (Intercept)       log(x)       log(y) 
## 1.641356e+40 7.858262e-04 3.505145e-01
## (Intercept)      log(x)      log(y) 
##   92.598926   -7.148775   -1.048353
## [1] 26118.86
  • \(\lambda(x, y)\) is modeled as \(\beta_0 + \beta_1 \cdot x + \beta_2 \cdot y\). This is an inhomogeneous Poisson process with a linear intensity in terms of \(x\) and \(y\).

  • AIC is 26118.86

  • Only ‘X’ shows a significant Ztest, meaning spatial intensity is significantly related to the X-direction.

  1. a covariate that represents the distance to the incinerator
## Point process model
## Fitted to data: lung_ppp
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.ppp(Q = lung_ppp, trend = ~incin_dist, interaction = Poisson())
## Edge correction: "border"
##  [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
## 
## Data pattern:
## Planar point pattern:  978 points
## Average intensity 4.31e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
##                      (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## 
## Dummy quadrature points:
##      70 x 70 grid of dummy points, plus 4 corner points
##      dummy spacing: 250.0000 x 252.8571 units
## 
## Original dummy parameters: =
## Planar point pattern:  3659 points
## Average intensity 1.61e-05 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
##                      (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## Quadrature weights:
##      (counting weights based on 70 x 70 array of rectangular tiles)
## All weights:
##  range: [4270, 63200]    total: 2.27e+08
## Weights on data points:
##  range: [4270, 31600]    total: 17800000
## Weights on dummy points:
##  range: [4270, 63200]    total: 2.09e+08
## --------------------------------------------------------------------------------
## FITTED :
## 
## Nonstationary Poisson process
## 
## ---- Intensity: ----
## 
## Log intensity: ~incin_dist
## Model depends on external covariate 'incin_dist'
## Covariates provided:
##  incin_dist: distfun
## 
## Fitted trend coefficients:
##   (Intercept)    incin_dist 
## -5.126593e+00 -1.453328e-05 
## 
##                  Estimate         S.E.       CI95.lo      CI95.hi Ztest
## (Intercept) -5.126593e+00 3.778934e+00 -1.253317e+01 2.279981e+00      
## incin_dist  -1.453328e-05 7.600753e-06 -2.943049e-05 3.639179e-07      
##                  Zval
## (Intercept) -1.356624
## incin_dist  -1.912085
## 
## ----------- gory details -----
## 
## Fitted regular parameters (theta):
##   (Intercept)    incin_dist 
## -5.126593e+00 -1.453328e-05 
## 
## Fitted exp(theta):
## (Intercept)  incin_dist 
## 0.005936754 0.999985467
##   (Intercept)    incin_dist 
## -5.126593e+00 -1.453328e-05
## [1] 26120.46

Since incinerator is a single point, we need to create a square window from (xi - r, yi - r) to (xi + r, yi + r), where r is 20 meters. In this way, we can approximate a polygon.

  • Here, \(\lambda(x, y)\) is modeled as \(\exp(\beta_0 + \beta_1 \cdot \text{dist_to_incin}(x, y))\).

  • This is an inhomogeneous Poisson model.

  • The parameter estimate is -1.453328e-05, but is NOT statistically significant. There is no strong evidence that intensity will increase or decrease with distance. In other words, it suggests the distance from the incinerator does not affect the intensity of lung cancers.

Larynx Cancer

  1. no covariates
## Point process model
## Fitted to data: Larynx_ppp
## Fitting method: maximum likelihood
## Model was fitted analytically
## Call:
## ppm.formula(Q = Larynx_ppp ~ 1, interaction = Poisson())
## Edge correction: "border"
##  [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
## 
## Data pattern:
## Planar point pattern:  58 points
## Average intensity 2.56e-07 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
##                      (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## 
## Dummy quadrature points:
##      32 x 32 grid of dummy points, plus 4 corner points
##      dummy spacing: 546.875 x 553.125 units
## 
## Original dummy parameters: =
## Planar point pattern:  794 points
## Average intensity 3.5e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
##                      (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## Quadrature weights:
##      (counting weights based on 32 x 32 array of rectangular tiles)
## All weights:
##  range: [9700, 302000]   total: 2.27e+08
## Weights on data points:
##  range: [60500, 151000]  total: 7960000
## Weights on dummy points:
##  range: [9700, 302000]   total: 2.19e+08
## --------------------------------------------------------------------------------
## FITTED :
## 
## Stationary Poisson process
## 
## ---- Intensity: ----
## 
## 
## Uniform intensity:
## [1] 2.555235e-07
## 
##              Estimate      S.E.   CI95.lo  CI95.hi Ztest      Zval
## log(lambda) -15.17995 0.1313064 -15.43731 -14.9226   *** -115.6071
## 
## ----------- gory details -----
## 
## Fitted regular parameters (theta):
## log(lambda) 
##   -15.17995 
## 
## Fitted exp(theta):
##  log(lambda) 
## 2.555235e-07
## log(lambda) 
##   -15.17995
## Nothing plotted -- all plots selected are flat surfaces.
## [1] 1878.874
  • Since the intensity is constant, the homogeneous Poisson point possess is being fit.

  • The uniform intensity is 2.55e-07, meaning you expect 2.55e-07 lung cancer cases per unit area

  • AIC is 1878.874

  1. linear trend
## Point process model
## Fitted to data: Larynx_ppp
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.ppp(Q = Larynx_ppp, trend = ~log(x) + log(y), interaction = Poisson())
## Edge correction: "border"
##  [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
## 
## Data pattern:
## Planar point pattern:  58 points
## Average intensity 2.56e-07 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
##                      (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## 
## Dummy quadrature points:
##      32 x 32 grid of dummy points, plus 4 corner points
##      dummy spacing: 546.875 x 553.125 units
## 
## Original dummy parameters: =
## Planar point pattern:  794 points
## Average intensity 3.5e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
##                      (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## Quadrature weights:
##      (counting weights based on 32 x 32 array of rectangular tiles)
## All weights:
##  range: [9700, 302000]   total: 2.27e+08
## Weights on data points:
##  range: [60500, 151000]  total: 7960000
## Weights on dummy points:
##  range: [9700, 302000]   total: 2.19e+08
## --------------------------------------------------------------------------------
## FITTED :
## 
## Nonstationary Poisson process
## 
## ---- Intensity: ----
## 
## Log intensity: ~log(x) + log(y)
## 
## Fitted trend coefficients:
## (Intercept)      log(x)      log(y) 
##  260.569833  -18.876038   -2.663316 
## 
##               Estimate      S.E.    CI95.lo    CI95.hi Ztest       Zval
## (Intercept) 260.569833 226.54444 -183.44911 704.588772        1.1501930
## log(x)      -18.876038  10.88638  -40.21294   2.460868       -1.7339137
## log(y)       -2.663316  13.01807  -28.17826  22.851626       -0.2045862
## 
## ----------- gory details -----
## 
## Fitted regular parameters (theta):
## (Intercept)      log(x)      log(y) 
##  260.569833  -18.876038   -2.663316 
## 
## Fitted exp(theta):
##   (Intercept)        log(x)        log(y) 
## 1.458951e+113  6.342211e-09  6.971664e-02
## (Intercept)      log(x)      log(y) 
##  260.569833  -18.876038   -2.663316
## [1] 1879.827
  • \(\lambda(x, y)\) is modeled as \(\beta_0 + \beta_1 \cdot x + \beta_2 \cdot y\). This is an inhomogeneous Poisson process with a linear intensity in terms of \(x\) and \(y\).

  • AIC is 1879.827

  • No covariate shows a significant Ztest, meaning spatial intensity is NOT significantly related to both the X-direction and Y-direction.

  1. a covariate that represents the distance to the incinerator
## Point process model
## Fitted to data: Larynx_ppp
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.ppp(Q = Larynx_ppp, trend = ~incin_dist, interaction = Poisson())
## Edge correction: "border"
##  [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
## 
## Data pattern:
## Planar point pattern:  58 points
## Average intensity 2.56e-07 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
##                      (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## 
## Dummy quadrature points:
##      32 x 32 grid of dummy points, plus 4 corner points
##      dummy spacing: 546.875 x 553.125 units
## 
## Original dummy parameters: =
## Planar point pattern:  794 points
## Average intensity 3.5e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
##                      (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## Quadrature weights:
##      (counting weights based on 32 x 32 array of rectangular tiles)
## All weights:
##  range: [9700, 302000]   total: 2.27e+08
## Weights on data points:
##  range: [60500, 151000]  total: 7960000
## Weights on dummy points:
##  range: [9700, 302000]   total: 2.19e+08
## --------------------------------------------------------------------------------
## FITTED :
## 
## Nonstationary Poisson process
## 
## ---- Intensity: ----
## 
## Log intensity: ~incin_dist
## Model depends on external covariate 'incin_dist'
## Covariates provided:
##  incin_dist: distfun
## 
## Fitted trend coefficients:
##   (Intercept)    incin_dist 
##  3.8906729971 -0.0000383648 
## 
##                  Estimate         S.E.       CI95.lo      CI95.hi Ztest
## (Intercept)  3.8906729971 1.578276e+01 -2.704297e+01 3.482431e+01      
## incin_dist  -0.0000383648 3.177121e-05 -1.006352e-04 2.390562e-05      
##                   Zval
## (Intercept)  0.2465141
## incin_dist  -1.2075336
## 
## ----------- gory details -----
## 
## Fitted regular parameters (theta):
##   (Intercept)    incin_dist 
##  3.8906729971 -0.0000383648 
## 
## Fitted exp(theta):
## (Intercept)  incin_dist 
##  48.9438145   0.9999616
##   (Intercept)    incin_dist 
##  3.8906729971 -0.0000383648
## [1] 1879.358

Similarly, since incinerator is a single point, we need to create a square window from (xi - r, yi - r) to (xi + r, yi + r), where r is 20 meters. In this way, we can approximate a polygon.

  • Here, \(\lambda(x, y)\) is modeled as \(\exp(\beta_0 + \beta_1 \cdot \text{dist_to_incin}(x, y))\).

  • This is an inhomogeneous Poisson model.

  • The parameter estimate is -0.000038, but is NOT statistically significant. There is no strong evidence that intensity will increase or decrease with distance. In other words, it suggests the distance from the incinerator does not affect the intensity of larynx cancers.

8

lung Cancer

## Stationary Cox point process model
## Fitted to point pattern dataset 'lung_ppp'
## Fitted by minimum contrast
##  Summary statistic: K-function
## Minimum contrast fit (object of class "minconfit")
## Model: Log-Gaussian Cox process
##   Covariance model: exponential
## Fitted by matching theoretical K function to lung_ppp
## 
## Internal parameters fitted by minimum contrast ($par):
##     sigma2      alpha 
##   2.534082 802.515371 
## 
## Fitted covariance parameters:
##        var      scale 
##   2.534082 802.515371 
## Fitted mean of log of random intensity:  -13.6209
## 
## Converged successfully after 87 function evaluations
## 
## Starting values of parameters:
##   sigma2    alpha 
##   1.0000 206.1528 
## Domain of integration: [ 0 , 4375 ]
## Exponents: p= 2, q= 0.25
## 
## ----------- TREND  -----
## Point process model
## Fitted to data: X
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.ppp(Q = X, trend = trend, rename.intercept = FALSE, covariates = covariates, 
##     covfunargs = covfunargs, use.gam = use.gam, forcefit = TRUE, 
##     improve.type = ppm.improve.type, improve.args = ppm.improve.args, 
##     nd = nd, eps = eps)
## Edge correction: "border"
##  [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
## 
## Data pattern:
## Planar point pattern:  978 points
## Average intensity 4.31e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
##                      (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## 
## Dummy quadrature points:
##      70 x 70 grid of dummy points, plus 4 corner points
##      dummy spacing: 250.0000 x 252.8571 units
## 
## Original dummy parameters: =
## Planar point pattern:  3659 points
## Average intensity 1.61e-05 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
##                      (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## Quadrature weights:
##      (counting weights based on 70 x 70 array of rectangular tiles)
## All weights:
##  range: [4270, 63200]    total: 2.27e+08
## Weights on data points:
##  range: [4270, 31600]    total: 17800000
## Weights on dummy points:
##  range: [4270, 63200]    total: 2.09e+08
## --------------------------------------------------------------------------------
## FITTED :
## 
## Stationary Poisson process
## 
## ---- Intensity: ----
## 
## 
## Uniform intensity:
## [1] 4.313092e-06
## 
##              Estimate       S.E.   CI95.lo   CI95.hi Ztest     Zval
## (Intercept) -12.35386 0.03197647 -12.41653 -12.29118   *** -386.342
## 
## ----------- gory details -----
## 
## Fitted regular parameters (theta):
## (Intercept) 
##   -12.35386 
## 
## Fitted exp(theta):
##  (Intercept) 
## 4.313092e-06 
## 
## ----------- COX  -----------
## Model: log-Gaussian Cox process
## 
##  Covariance model: exponential
## Fitted covariance parameters:
##        var      scale 
##   2.534082 802.515371 
## Fitted mean of log of random intensity: -13.6209
## 
## Final standard error and CI
## (allowing for correlation of Cox process):
##              Estimate     S.E.   CI95.lo CI95.hi Ztest      Zval
## (Intercept) -12.35386 0.237686 -12.81971 -11.888   *** -51.97552

## Generating 78 simulated realisations of fitted cluster model (39 to estimate 
## the mean and 39 to calculate envelopes) ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
## 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
## 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 
## 78.
## 
## Done.

  • From the plot of K-functions, we can see the observed K(r) (red line) is above the theoretical K(r) (green line), meaning there is clustering.

  • From the plot of G-functions, we can see observed G(r) is above the theoretical G(r), meaning the point pattern is clustering.

  • The result align with previous results

  • The Mean log‐intensity is −13.6209

  • A non‐zero variance(2.53) indicates that some locations have much higher log‐intensity than others, in other words, clusters exists.

Larynx Cancer

## Stationary Cox point process model
## Fitted to point pattern dataset 'Larynx_ppp'
## Fitted by minimum contrast
##  Summary statistic: K-function
## Minimum contrast fit (object of class "minconfit")
## Model: Log-Gaussian Cox process
##   Covariance model: exponential
## Fitted by matching theoretical K function to Larynx_ppp
## 
## Internal parameters fitted by minimum contrast ($par):
##     sigma2      alpha 
##   2.011052 693.817641 
## 
## Fitted covariance parameters:
##        var      scale 
##   2.011052 693.817641 
## Fitted mean of log of random intensity:  -16.18522
## 
## Converged successfully after 61 function evaluations
## 
## Starting values of parameters:
##   sigma2    alpha 
##    1.000 1442.361 
## Domain of integration: [ 0 , 4375 ]
## Exponents: p= 2, q= 0.25
## 
## ----------- TREND  -----
## Point process model
## Fitted to data: X
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.ppp(Q = X, trend = trend, rename.intercept = FALSE, covariates = covariates, 
##     covfunargs = covfunargs, use.gam = use.gam, forcefit = TRUE, 
##     improve.type = ppm.improve.type, improve.args = ppm.improve.args, 
##     nd = nd, eps = eps)
## Edge correction: "border"
##  [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
## 
## Data pattern:
## Planar point pattern:  58 points
## Average intensity 2.56e-07 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
##                      (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## 
## Dummy quadrature points:
##      32 x 32 grid of dummy points, plus 4 corner points
##      dummy spacing: 546.875 x 553.125 units
## 
## Original dummy parameters: =
## Planar point pattern:  794 points
## Average intensity 3.5e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
##                      (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## Quadrature weights:
##      (counting weights based on 32 x 32 array of rectangular tiles)
## All weights:
##  range: [9700, 302000]   total: 2.27e+08
## Weights on data points:
##  range: [60500, 151000]  total: 7960000
## Weights on dummy points:
##  range: [9700, 302000]   total: 2.19e+08
## --------------------------------------------------------------------------------
## FITTED :
## 
## Stationary Poisson process
## 
## ---- Intensity: ----
## 
## 
## Uniform intensity:
## [1] 2.555892e-07
## 
##              Estimate      S.E.   CI95.lo   CI95.hi Ztest      Zval
## (Intercept) -15.17969 0.1313064 -15.43705 -14.92234   *** -115.6051
## 
## ----------- gory details -----
## 
## Fitted regular parameters (theta):
## (Intercept) 
##   -15.17969 
## 
## Fitted exp(theta):
##  (Intercept) 
## 2.555892e-07 
## 
## ----------- COX  -----------
## Model: log-Gaussian Cox process
## 
##  Covariance model: exponential
## Fitted covariance parameters:
##        var      scale 
##   2.011052 693.817641 
## Fitted mean of log of random intensity: -16.18522
## 
## Final standard error and CI
## (allowing for correlation of Cox process):
##              Estimate      S.E.   CI95.lo   CI95.hi Ztest      Zval
## (Intercept) -15.17969 0.2310018 -15.63245 -14.72694   *** -65.71245

## Generating 78 simulated realisations of fitted cluster model (39 to estimate 
## the mean and 39 to calculate envelopes) ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
## 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
## 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 
## 78.
## 
## Done.

  • From the plot of K-functions, we can see the observed K(r) (red line) is above, but very close to theoretical K(r) (green line), meaning there is a little bit clustering. The point pattern is mainly random.

  • From the plot of G-functions, we can see observed G(r) is very close to theoretical G(r), meaning the point pattern is mainly random.

  • The result align with previous results

  • The Mean log‐intensity is −16.18

  • A non‐zero variance(2.011) indicates that some locations have much higher log‐intensity than others, in other words, clusters exists.

9

a

when minPts = 3 and eps = 2km

## DBSCAN clustering for 978 objects.
## Parameters: eps = 2000, minPts = 3
## Using euclidean distances and borderpoints = TRUE
## The clustering contains 1 cluster(s) and 2 noise points.
## 
##   0   1 
##   2 976 
## 
## Available fields: cluster, eps, minPts, metric, borderPoints

when minPts = 3 and eps = 0.5km

## DBSCAN clustering for 978 objects.
## Parameters: eps = 500, minPts = 3
## Using euclidean distances and borderpoints = TRUE
## The clustering contains 21 cluster(s) and 76 noise points.
## 
##   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19 
##  76 222 258 163   8  22  28  48   7   5  17  23   3   7  26  18  24   8   6   3 
##  20  21 
##   3   3 
## 
## Available fields: cluster, eps, minPts, metric, borderPoints

minPts = 3 and eps = 2km minPts = 3 and eps = 0.5km
Result All points into one cluster except two points. The chosen parameters considered most points to be part of one big polygon. There are more clusters, capturing more local clusters rather than forcing one giant cluster.
Number of clusters 1 21
Number of noise points 2 76

The number of clusters can be very different when eps is very different. Tuning this parameter can leading to different results and interpretation, so that more patterns can be discovered and analyzed.

b

## HDBSCAN clustering for 978 objects.
## Parameters: minPts = 3
## The clustering contains 32 cluster(s) and 55 noise points.
## 
##   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19 
##  55   9  12   3  23 174   9   7   6   9   3   3   8   5   3   6  33  15  19  42 
##  20  21  22  23  24  25  26  27  28  29  30  31  32 
##  35   3  14  13   3  27   4  18   5   3  26   5 378 
## 
## Available fields: cluster, minPts, coredist, cluster_scores,
##                   membership_prob, outlier_scores, hc

  • 32 clusters and 55 noise points

  • Instead of a fixed eps, HDBSCAN can use multiple values of eps and builds a hierarchical tree. This can reveal clusters at different density levels.

  • From the hierarchical tree, we can see: at small eps levels, there are more clusters. As you increases eps, some clusters merge into larger clusters.

  • By HDBSCAN, different regions of the domain can end up split or merged at different values of eps. So the the final clusters can be seen as forming at locally appropriate density levels.